Tree k-Grammar Models for Natural Language Modelling and Parsing
نویسندگان
چکیده
In this paper, we compare three different approaches to build a probabilistic context-free grammar for natural language parsing from a tree bank corpus: (1) a model that simply extracts the rules contained in the corpus and counts the number of occurrences of each rule; (2) a model that also stores information about the parent node’s category, and (3) a model that estimates the probabilities according to a generalized k-gram scheme for trees with k = 3. The last model allows for faster parsing and decreases considerably the perplexity of test samples.
منابع مشابه
A Comparison of PCFG Models
In this paper, we compare three different approaches to build a probabilistic context-free grammar for natural language parsing from a tree bank corpus: 1) a model that simply extracts the rules contained in the corpus and counts the number of occurrences of each rule 2) a model that also stores information about the parent node's category and, 3) a model that estimates the probabilities accord...
متن کاملComplexity and Expressiveness for Formal Structures in Natural Language Processing
The formalized and algorithmic study of human language within the field of Natural Language Processing (NLP) has motivated much theoretical work in the related field of formal languages, in particular the subfields of grammar and automata theory. Motivated and informed by NLP, the papers in this thesis explore the connections between expressibility – that is, the ability for a formal system to ...
متن کاملSmoothing Techniques for Tree-k-Grammar-Based Natural Language Modeling
In a previous work, a new probabilistic context-free grammar (PCFG) model for natural language parsing derived from a tree bank corpus has been introduced. The model estimates the probabilities according to a generalized k-grammar scheme for trees. It allows for faster parsing, decreases considerably the perplexity of the test samples and tends to give more structured and refined parses. Howeve...
متن کاملA Lexicalized Tree-Adjoining Grammar for Vietnamese
In this paper, we present the first sizable grammar built for Vietnamese using LTAG, developed over the past two years, named vnLTAG. This grammar aims at modelling written language and is general enough to be both applicationand domain-independent. It can be used for the morpho-syntactic tagging and syntactic parsing of Vietnamese texts, as well as text generation. We then present a robust par...
متن کاملAn improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کامل